home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Celestin Apprentice 5
/
Apprentice-Release5.iso
/
Source Code
/
Libraries
/
Advanced I⁄O v2.3
/
Advanced i⁄o
/
README
< prev
next >
Wrap
Text File
|
1995-06-19
|
8KB
|
224 lines
Service C++ functions and classes
dealing mostly with "advanced" i/o and the arithmetic compression
***** For the version history, read on
***** Verification files: vendian_io, vhistogram, varithm
Don't forget to compile and run them, see comments in the Makefile for
details. The verification code checks to see that all the functions
in this package have compiled and run well. The code also can serve as
an example how the package classes/functions can be used
***** Highlights and idioms
---- Extended file names
The package adds support for "extended" file names with pipes in them.
That is, the name of a file to open may be specified now as "|
command" or "command |" i.e. as a pipe. For example,
EndianIn istream;
istream.open("gunzip < /tmp/aa.gz |");
EndianOut stream("| compress > /tmp/aa.Z");
image.write_pgm("| xv -");
The <command> is launched in a subprocess through '/bin/sh' with its
standard input/output hooked, through pipe(), to the file being
opened.
This extension is implemented on the lowest possible level, right
before the request to open a file goes to OS (through the system call
open(2)). A function sys_open() (in the source file sys_open.cc) acts
as a "patch": that is, if you call sys_open() instead of open() to
open a file, you get all the open() functionality plus the extended
file names.
Thus, some libg++ 2.6.2 iostream functions were modified to call
sys_open() instead of open(). If one wants to use the extended file
names outside gcc/libg++, he needs to do open->sys_open substitution
himself.
---- Explicit Endian I/O of short/long integers
EndianOut stream("/tmp/aa");
stream.set_littlendian();
stream.write_long(1);
That means, 1 would be written as a long integer with the least
significant byte first, NO MATTER which computer (computer
architecture) the code is running on. Using explicit endian
specification (like above) is the only way to ensure portability of
binary files containing arithmetic data.
---- Stream sharing
EndianIn/Out streams can share the same i/o buffer. This is useful
when one needs to read/write a "stratified" (layered) file consisting
of various variable-bit encoded data interspersed with headers. For
example, a file may begin with a header (telling the total number of
data items, normalization factors) followed by some variable-bit
encoding of items, followed by another header, followed by an
arithmetic compressed stream of data, etc. Thus, a file can be like a
waffle pie, made of many layers: each of them being interpreted using
different streams, each of them collectively sharing the same file and
the same file pointer. The situation is similar to sharing an open
file (and a file pointer) among parent and child (forked) processes.
Note that merely opening a stream on a dup()-ed file handle, or
sync()-ing the stream doesn't cut it entirely. See endian_io.cc for
more discussion. The bottom line is, this package implements stream
sharing in a safe and portable way: it works on a Mac just as well as
on different flavors of UNIX.
---- Simple variable-length coding of short integers
The code is intended for writing a collection of short integers where
many of them are rather small in value; still, big values can crop up
at times, so we can't limit the size of the code to anything less than
16 bits. The code is a variation of a start-stop code described in
Appendix A, "Variable-length representations of the integers" of the
"Text Compression" book by T.Bell, J.Cleary and I.Witten,
p.290-295. The present code features support for both negative and
positive numbers and an optimization based on the fact that all
numbers are no larger than 2^15-1 in abs value, and an assumption that
most of them are smaller than 512 (in absolute value).
---- Arithmetic compression of a stream of integers
The present package provides a clean C++ implementation of Bell,
Cleary and Witten's arithmetic compression code, with a clear
separation between a model and the coder. ArithmCodingIn /
ArithmCodingOut act as i/o streams that encode signed short integers
you put() to, and decode them when you get() them. The
ArithmCodingIn/Out object needs a "plug-in" of a class
Input_Data_Model when the stream is created. The Input_Data_Model
object is responsible for providing the codec with the probabilities
(frequencies) a given data item is expected to appear with, and for
finding a symbol given its cumulative frequency. Input_Data_Model may
also modify itself to account for a new symbol. Thus, the ArithmCoding
class is a sort of the 'iostream' class that writes/reads data items
to/from the stream performing encoding/decoding. It relies upon the
Input_Data_Model for the probabilities needed to perform the
arithmetic coding.
The current version of the package provides two Input_Data_Model
plug-ins, both performing adaptive "modeling" of a stream of
integers. The first plug-in uses a simple 0-order adaptive prediction
(like the model given in the Witten's book). The other one takes a
histogram to sketch the initial distribution, and is a bit
sophisticated in updating the model. It is used in compressing a
wavelet decomposition of an image. The code below (taken literally
from varithm.cc) demonstrates how the coder classes are actually used.
The first example writes two different streams (of different patterns,
that's why it was better to encode them separately) into the same file
EndianOut stream("/tmp/aa");
stream.set_littlendian();
const int sample_header = 12345;
{
AdaptiveModel model(-1,4);
ArithmCodingOut ac(model);
ac.open(stream);
for(i=0; i<sizeof(pattern1)/sizeof(pattern1[0]); i++)
ac.put(pattern1[i]);
}
{
stream.write_long(sample_header); // write a "header"
AdaptiveModel model(-1,4); // followed by the arithmetic coded
ArithmCodingOut ac(model); // stream
ac.open(stream);
for(i=0; i<sizeof(pattern2)/sizeof(pattern2[0]); i++)
ac.put(pattern2[i]);
}
stream.close();
The reading is similar.
The second example uses a different model plug-in, yet i/o is similar
static void test_adh(void)
{
message("\nCreating Histogram ...\n");
Histogram histogram(-7,7);
register int i;
for(i=0; i<MyPattern_size; i++)
histogram.put(MyPattern[i]);
message("\nWriting data ...");
AdaptiveHistModel model(histogram);
ArithmCodingOut ac(model);
ac.open("/tmp/aa");
for(i=0; i<MyPattern_size; i++)
ac.put(MyPattern[i]);
ac.close();
message("\nCoded file /tmp/aa has been created\n");
AdaptiveHistModel i_model;
ArithmCodingIn ac1(i_model);
ac1.open("/tmp/aa");
for(i=0; i<MyPattern_size; i++)
{
register int val_read = ac1.get();
if( val_read != MyPattern[i] )
_error("Read value %d of the %d-th integer is not what it is "
"supposed to be, %d",
val_read, i, MyPattern[i]);
}
ac1.get();
assert( ac1.is_eof() );
}
---- Convenience Functions
The package defines a few functions I found convenient to use, like
message(...) (which is equivalent to fprintf(stderr,....)) and
_error(...) ( the same as message(...), abort();). One doesn't need to
to #include <stdio.h> to use them.
***** Grand plans
***** Revision history
Version 2.3 - Jun 1995
Fixed the last remaining incompatibility glitches. Now, exactly the
same code compiles on a Mac with CodeWarrior 6 and on Unix with gcc
2.6.3
Version 2.2 - May 1995
Added a variable-length (start/stop) coding of signed short integers.
Added dealing with simple histograms of an integer-valued
distribution.
Version 2.1 - Mar 1995
Introducing bool where appropriate (instead of int) and adding checks
to make sure an EndianIn/Out stream was opened successfully.
Version 2.0 - Feb 1995
Big change: splitting EndianIO into EndianIn and EndianOut and
removing all libg++-specific things; everything should be very
portable now. Making sharing of the streambuffer portable.
Version 1.4 - Feb 1994
Updated for libg++ 2.5.3
Version 1.3 - Aug 1993
Introducing attachment of one stream to another, or sharing of a
streambuf among several streams. Took care of properly terminating an
arithm coding stream by writing a few phony bits at the end (so we
won't hit the EOF on reading). Thus it is possible now to concatenate
arithmetic coding streams.
Version 1.2 - Jun 1992
Updated to compile under gcc/g++ 2.2.1 and work with libg++ 2.0. The
first implementation of the arithmetic coding package
Version 1.1 - Nov 1991 - May 1992
Initial revision